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Chromatin immunoprecipitation followed by deep sequencing (ChlP-seq) experiments are widely used to determine, 
within entire genomes, the occupancy sites of any protein of interest, including, for example, transcription factors, RNA 
polymerases, or histones with or without various modifications. In addition to allowing the determination of occupancy 
sites within one cell type and under one condition, this method allows, in principle, the establishment and comparison of 
occupancy maps in various cell types, tissues, and conditions. Such comparisons require, however, that samples be 
normalized. Widely used normalization methods that include a quantile normalization step perform well when factor 
occupancy varies at a subset of sites, but may miss uniform genome-wide increases or decreases in site occupancy. We 
describe a spike adjustment procedure [SAP] that, unlike commonly used normalization methods intervening at the 
analysis stage, entails an experimental step prior to immunoprecipitation. A constant, low amount from a single batch of 
chromatin of a foreign genome is added to the experimental chromatin. This "spike" chromatin then serves as an internal 
control to which the experimental signals can be adjusted. We show that the method improves similarity between rep- 
licates and reveals biological differences including global and largely uniform changes. 

[Supplemental material is available for this article.] 



In chromatin immunoprecipitation (ChIP) followed by deep se- 
quencing (ChlP-seq) (Barski et al. 2007; Johnson et al. 2007a; 
Mikkelsen et al. 2007), chromatin is first treated, within intact 
cells, with a cross-linking reagent such as formaldehyde. The cross- 
linked chromatin is then isolated and fragmented, often by soni- 
cation, and used as starting material for immunoprecipitations 
with antibodies directed against the factors of interest. The im- 
munoprecipitated material, containing the protein targeted by the 
antibody as well as any DNA cross-linked to it, is heated to reverse 
the crosslinks, the DNA is purified, and an amplified representa- 
tion of this DNA is submitted to deep sequencing. Deep sequenc- 
ing generates sequence "tags" of commonly 35 to —100 nucleo- 
tides (nt), which are then aligned onto the genome. Genomic 
regions enriched in aligned tags over noise (variously defined in 
different works) are interpreted as regions of factor occupancy. This 
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method has proven immensely powerful in characterizing chro- 
matin organization, i.e., in identifying sites bound, for example, by 
transcription factors, by histones carrying (or not) specific modi- 
fications, or by RNA polymerases. 

Apart from identifying regions of factor occupancy within 
a single chromatin sample, ChlP-seq is invaluable for compar- 
ing the level of occupancy at a set of loci (e.g., previously identified 
targets of a specific transcription factor) between different chro- 
matin samples from various cell types or tissues, from cells sub- 
mitted to different conditions, or from cells at different de- 
velopmental stages, etc. Unlike measurements of mature mRNAs, 
such experiments inform on changes occurring at the very first 
steps of gene expression, i.e., changes in chromatin structure and 
gene transcription. In such experiments, however, reliable sample 
normalization has proven difficult. 

There are a number of different methods to normalize ChlP- 
seq samples including scaling to total amounts of tags (i.e., nor- 
malizing for sequencing depth), quantile normalization, and other 
methods. Scaling to the total amount of sequence tags that can be 
aligned onto the genome is usually the first step (for examples, see 
Li et al. 2011; Landt et al. 2012; Le Martelot et al. 2012). Quantile 
normalization is also broadly applied because it can reveal differ- 

©2014 Bonhoure etal. This article, published in Genome Research, is available 
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ences at specific loci even in samples displaying relatively uniform 
global differences at all enriched loci (for examples, see Rahl et al. 
2010; Le Martelot et al. 2012). Scaling to total tag amounts and 
quantile normalization can have very different effects, particularly 
in cases of global differences; for example, if all regions enriched in 
one sample are uniformly enriched to a higher or lower degree in 
another sample. With just scaling to total amount of aligned tags, 
these differences might persist, but their interpretation will be 
difficult because scaling will not distinguish whether the differ- 
ences result from technical experimental variations or from 
genuine biological differences. With quantile normalization, on 
the other hand, the distributions of the various samples are made 
identical so that they can be easily compared, thus masking any 
uniform changes, whether genuine or not, from one sample to 
another. 

Here we describe a spike adjustment procedure (SAP) de- 
signed to allow comparison of occupancy levels for a set of loci 
of interest. Unlike the above, this method addresses the problem 
via an experimental procedure conducted prior to immunopre- 
cipitation. It consists of adding a constant, low amount of a single 
batch of foreign chromatin (e.g., human) as an internal control 
to each sample of the chromatin of interest (e.g., mouse) before 
immunoprecipitation. This allows adjustment of the signals 
in each sample to the internal control. We show that unlike 
only scaling to the total amount of aligned sequence tags or 
quantile normalization, the SAP allows the scoring of global 
and largely uniform changes when they result from biological 
differences. 

Results 

To illustrate the approach, Figure 1A shows a hypothetical exper- 
iment in which all ChlP-seq peaks are higher in a first sample (light 
blue) compared with a second sample (purple). The global change 
in peak size could in principle be the result of a biological differ- 
ence, e.g., occupancy in the second example might be reduced 
because of some change in cell metabolism; or it might reflect 
a technical problem such as reduced immunoprecipitation effi- 
ciency. In this hypothetical example, scaling to total amount of 
tags maintains the differences because the total number of se- 
quenced and aligned tags is roughly similar in both samples (Fig. 
IB). In contrast, scaling followed by quantile normalization reveals 
almost no differences between the samples (Fig. 1C), because the 
peaks in the second experiment are more or less uniformly lower 
than in the first. Thus, in this example the two methods give dif- 
ferent results and do not allow one to distinguish between the two 
scenarios, genuine biological difference or technical variability. 
The two scenarios should be distinguishable, however, by ad- 
justment to an internal reference. In panels D and E, the same 
experiment is shown as in panel A but this time with a spike 
signal, symbolized by the yellow bars. Replicates (or similar bi- 
ological samples) displaying apparent different occupancy due to 
technical problems would display a lower spike signal in the 
second sample and, after normalization to this internal reference 
by the SAP, little or no change (Fig. IF). In contrast, biologically 
different samples would reveal a global negative fold change 
(Fig. 1G). 

Addition of different percentages of human chromatin 
to mouse chromatin 

An internal reference is most useful when included as early as 
possible in an experimental procedure. We therefore sought to 



include the internal reference before the immunoprecipitation 
step, which is one of the steps likely to generate variation from one 
sample to another in the ChlP-seq protocol. We tested the use- 
fulness of adding spikes of human chromatin to mouse chromatin 
samples for ChlP-seq experiments performed with two antibodies: 
one directed against POLR3D (RPC4), a subunit of RNA polymerase 
(Pol) III; and the other against POLR2B (RPB2), the second largest 
subunit of Pol II (for a list of the samples used in this work and 
their nomenclature, see Table 1). Both antibodies are directed 
against peptides that are 100% conserved in mouse and human 
Pol III and Pol II, respectively. We first focused on experiments 
using the anti-POLR3D antibody and tested mixing different 
amounts of human chromatin with the mouse chromatin, with 
the aim of using the smallest possible amount of human chro- 
matin so as to avoid unnecessary contamination of the mouse 
sample, and yet obtaining a robust signal on a sufficient number 
of human genes. 

The various mixtures were used for ChlP-seq and the 
resulting 100-nt-long sequence tags were aligned with both the 
mouse (NCBI37/mm9) and human (GRCh37/hgl9) genomes. 
Supplemental Table SI, A and B, lists the tag counts that aligned 
to the mouse genome, the human genome, or to both genomes 
(ambiguous tags). Adding 2.5%, 5%, or 10% human chromatin 
derived from HeLa cells resulted in an increase in the number 
of reads aligning to the human genome, as expected, but had 
little influence on the amount of ambiguous reads, indicating 
that most of the ambiguous reads originate from the mouse 
chromatin, which is not surprising since this chromatin repre- 
sents in all cases most of the material. Moreover, the ambiguous 
tags represented only a small proportion of the tags mapping to 
human Pol III regions (see Supplemental Table SIB [sheet 2], last 
column) such that there was little loss of sensitivity in the 
human signal due to the exclusion of tags that cannot be un- 
equivocally mapped to the mouse or human genomes. For 
subsequent analyses, we thus used 2.5% human chromatin, 
as this amount produced a usable signal on human genes (see 
below). 

Spiking samples allows quality control 

Figure 2 summarizes the steps in the SAP. After tag alignment to the 
human and the mouse genomes and removal of ambiguous tags, 
we first tested whether the human spike signal can be used for 
quality control evaluation. Indeed, since the human chromatin 
added to the experimental mouse samples is constant from one 
sample to another, the quality of the human signal should in 
principle attest to the technical quality of the experiment, unlike 
experimental mouse samples where the mouse signal may vary 
according to biological differences. We thus compared a sample 
generated with our standard protocol (90_R1) (see Table 1) to 
a "poor" sample (97.5_P1) (see Table 1) in which we deliberately 
contaminated the immunoprecipitated material by adding back 
1.5% of the supernatant obtained after immunoprecipitation. 
Figure 3 shows, for each of these two spiked-in samples, a mean- 
difference scatter plot comparing spike human tag counts in 400- 
bp genomic bins obtained in the ChIP versus the input. The red 
dots indicate bins that overlap with what we refer to hereafter as 
"Pol III loci/' i.e., annotated Pol III genes (whether occupied by Pol 
III or not) as well as previously identified Pol Ill-occupied loci (see 
Table S2 in Renaud et al. 2014). With the standard protocol (upper 
panel), many of the bins overlapping with spike human Pol III loci 
showed strong enrichment in the ChIP sample with respect to the 
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Figure 1 . Normalization can obscure global effects. (A) Schematic representation of peaks obtained after ChlP-seq in a hypothetical example where all 
peaks are uniformly diminished in the second (purple) sample compared with the first (light blue). These samples can represent a replicate experiment, in 
which case the overall decrease observed in the second sample is the result of experimental variation, or they can represent experiments performed with 
samples collected under different conditions, in which case the global decrease might reflect a biological difference. No spike chromatin is included. (B) 
Normalization by scaling to total number of tags aligned onto the genome (i.e., normalization for sequencing depth) showing tag counts (top) and log 2 
fold change (bottom). In this hypothetical example, the number of tags aligned onto the genome is quite similar in both samples, and this type of 
normalization indicates a general decrease for each peak in the second sample, whether the two samples are biologically different (and thus should indeed 
indicate a protein occupancy decrease in sample 2) or similar (and thus should in fact display similar signals). (C) Normalization by scaling followed by 
quantile normalization showing tag counts (top) and log 2 fold change (bottom). In this example, the second step — quantile normalization — will equalize 
the sample distributions whether the samples are biologically different or not, because the decrease in sample 2 is uniform. In D and f, spike chromatin is 
included in the sample and gives rise to signals symbolized by the yellow bars. (F,C) Normalization by scaling followed by spike adjustment showing tag 
counts (top) and log 2 fold change (bottom). In F, the spike adjustment factor increased the signals in sample 2 by a factor of about two, in C, the spike 
adjustment factor decreased the signal in sample 2 by a factor of about 0.8 (see yellow bars). Spike adjustment reveals whether the samples are in fact 
similar (example in F) or are in fact biologically different (example in C). 



input. In contrast, the poor sample (lower panel) showed almost 
no enrichment. Thus, the amount of signal in human Pol III loci 
reflects sample quality, as expected, and can be used for quality 



control. A quantitative metric to characterize signal content can be 
the percentage of tags aligning in gene regions. Indeed, as shown 
in Supplemental Table SIB (sheet 2), the percentage of human tags 
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Table 1. Samples used in this work 

Sample name Mouse chromatin (%) Human chromatin (%) Antigen Amount of antibody used (|ulL) Number of sonication cycles 
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The numbers in the names (97.5, 95, or 90) refer to the percentage of mouse chromatin in the sample (the rest correspond to human chromatin). R1 or 
R2 refer to technical replicates (separate immunoprecipitations performed with the same chromatin sample); PI refers to a "poor" sample in which the 
immunoprecipitation conditions were changed with the aim of reducing efficiency; and S5, SI 0, and SI 5 refer to different sonication conditions. mR1_WT 
and mR2_WT / as well as mR1_KO and mR2_KO, are in each case replicate ChlP-seq performed at a 10-mo interval with the same batch of wild-type (WT) 
and Mafl knockout (KO) mouse liver chromatin. The replicates were spiked with the same batch of human chromatin. All samples were immunopre- 
cipitated with antibodies directed against the POLR3D subunit of Pol III, except for the samples labeled "RPB2," which were immunoprecipitated with 
antibodies directed against the POLR2B subunit of Pol II. 



in human Pol III loci was 8.7- to 16.7-fold lower for the poor 
97.5_P1 sample compared with the standard 97.5_R1 or any of the 
other standard replicate (R) samples (column G). Together with 
visual inspection of scatter plots as shown above, this information 
can be used to identify samples that should be discarded (and ex- 
periments that should be redone). 

The next step after assessing sample quality consisted in scal- 
ing the samples relative to the total number of aligned tags (scaling 
for sequencing depth) in each experiment, which was performed 
separately for the mouse and human tags (Fig. 2, step 2). We then 
selected signal loci (step 3; see Methods) and calculated scores for 
the human sample and preliminary scores for the mouse sample 
(step 4). We then used the sample-to-sample differences in human 
signals to compute a spike adjustment factor for each sample. This 
spike adjustment factor was applied to the preliminary scores of the 
mouse Pol III loci (for a list of these loci, see Table S3 in Renaud et al. 
2014) to obtain final scores (step 5; see Methods). 

Effect of sonication on the spike signal 

As sonication is performed before addition of the spiking material, 
a possible problem with the SAP might arise as a result of sonica- 
tion of the human and mouse samples to different average frag- 
ment sizes. This is illustrated in Figure 4A. In this example, the 
human chromatin, which is from a single batch, is sonicated to an 
average size of 500 bp. In contrast, the first mouse chromatin is 
sonicated to a larger average size (upper panel), whereas the second 
sample is sonicated to a smaller average size (lower panel). Size 
selection of DNA fragments from 200 to 400 bp during library 
preparation (indicated by the rectangle) would result in a smaller 
percentage of mouse chromatin in the first case compared with the 
second case. This problem should be in large part circumvented by 
the first normalization step, in which we scale independently 
the human and mouse signals to the total number of aligned se- 
quence tags. 

To directly test the effects of different average sizes of the 
mouse sample, we sonicated mouse chromatin for five, 10, and 15 
cycles. As expected, increasing the number of sonication cycles 



resulted in shorter average mouse chromatin fragment lengths, as 
visualized after analysis on a Bioanalyzer 2100 from Agilent (Fig. 
4B, upper panel, lanes S5, S10, S15) or after agarose gel electro- 
phoresis (lower panel). Figure 4B also shows the human spike 
chromatin, which was fragmented less completely than the mouse 
samples but nevertheless contained an abundance of fragments 
<1000 bp. 

Despite the variable length distributions of the mouse chro- 
matin samples, the SAP did not disrupt the data and sample 
alignments remained very high in all cases, as illustrated by the 
scatter plots in Figure 4C (for Pearson and Spearman correlations, 
see Figure 4 legend). Thus, spike adjustment is quite impervious to 
differences in sample sonication. 

Spike adjustment both improves similarity between biological 
replicates and reveals biological differences 

To test the usefulness of the SAP to both improve similarity be- 
tween replicates and reveal biological differences, we made use of 
two experiments that are part of an independent study (N Bonhoure, 
V Praz, RD Moir, IM Willis, and N Hernandez, unpubl.). In these 
experiments, which were performed at different times, before and 
after upgrade of the sequencer, but with the same batches of mouse 
and human chromatin, we compared Pol III occupancy in the liver 
of wild-type (WT) mice (samples mRl_WT and mR2_WT) (see 
Table 1) and mice lacking the Mafl gene (mRl_KO and mR2_KO). 
MAF1 is a repressor of Pol III transcription, both in yeast (Pluta 
et al. 2001; Upadhya et al. 2002) and in mammalian cells (Reina 
et al. 2006; Johnson et al. 2007b; Rollins et al. 2007), which 
prevents transcription complex assembly by binding to Pol III as 
well as to BRF1, a member of the Pol III preinitiation complex 
(Desai et al. 2005; Oficjalska-Pham et al. 2006; Reina et al. 2006; 
Goodfellow and White 2007; Vannini et al. 2010). In the absence 
of MAF1, one might expect a difference in Pol III occupancy at 
Pol III loci. 

We first compared the replicates before and after spike ad- 
justment (step 5 in Fig. 2). As shown in the scatter plots in Figure 5, 
A and B, the scores for Pol Ill-occupied loci were closer to the x = y 
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Figure 2. Schematic diagram summarizing the SAP. The main steps, 
i.e., examination of sample quality, scaling to total amount of genome- 
aligned tags, selection of signal genes, score calculation, and spike ad- 
justment, are numbered. 

line after (black) than before (orange) spike adjustment, both for 
the replicate samples from WT mice (panel A) and for those of the 
Mafl KO mice (panel B). 

We then compared the four samples using scaling to total 
number of tags (Fig. 1, cf. A and B), scaling and quantile nor- 
malization (Fig. 1, cf. A and C), or scaling and spike adjustment 
(Fig. 1, cf. D-G). Figure 5, C through E, shows the resulting box- 
plots of the occupancy scores on Pol III loci in WT (green) and 
Mafl KO mice (blue), in the first (light colors) or second (dark 
colors) experiments. After just scaling (panel C), the average and 
median occupancy were in each case higher in the Mafl KO 
samples compared with the corresponding WT sample. However, 
the average and mean of the first Mafl KO sample (mRl_KO) were 
very similar to the average and mean of the second WT sample 
(mR2_WT; cf. the second and third box plots), making the results 
difficult to interpret. Upon scaling and quantile normalization, 
the distributions of all samples became similar, as expected (panel 
D). In contrast, the SAP not only remarkably improved the 
agreement between replicates, in particular for the KO samples, 
but also revealed a clear difference between the WT and KO 
samples, with higher average Pol III occupancy in the KO samples 
(panel E). This was also evident in the empirical cumulative dis- 
tribution function (ECDF) graphs (panels F-H), showing identical 
distributions for all samples after scaling and quantile normalization 
(panel G), but more similar distributions for the two WTand the two 



KO samples, as well as better separation of the WT and KO sample 
pairs, for the samples normalized with SAP (cf. panels F and H). 

To examine the effect of scaling and quantile normalization 
versus the SAP on a locus per locus basis, we performed a differ- 
ential analysis with the two sets of normalized scores. The results 
are displayed as mean-difference plots in Figure 5, 1 and J, with the 
scores showing a significant difference in the WT versus Mafl 
KO samples in yellow (P < 0.01) and red (0.01 < P < 0.05). With the 
scaling and quantile normalization method (panel I), 34 loci 
had significantly different occupancy, but the minimum false- 
discovery rate (FDR = 0.045) was close to the cutoff 0.05, and there 
was a roughly equal number of loci with higher and lower scores 
in the Mafl KO compared with the WT samples. With the SAP, 
490 loci scored as having significantly different Pol III occupancy, 
and all but one (with a very low score) showed higher Pol III oc- 
cupancy in the KO compared with the WT samples (panel J). Thus, 
the SAP both improves similarity of replicates and reveals biological 
differences, even when these are quite uniform for all loci. 
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Figure 3. The spike chromatin can be used for quality control. Mean- 
difference scatter plot of human Pol III genome bin counts (in log scale). 
Red dots indicate genomic bins that overlap with Pol III loci. The genome 
was binned into 400-bp bins (corresponding to a typical Pol III gene 
length [—100 bp] extended by 150 bp in both the upstream and 
downstream directions). Zero-count bins were filtered out prior to plot- 
ting. (A) An example of a good-quality sample (90_R1 ). (B) An example of 
a poor-quality sample (97.5_P1). 
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Figure 4. The SAP tolerates sample-to-sample differences of average chromatin fragment length. (A) 
Illustration of two hypothetical cases. (Top) The mouse chromatin sample (blue) is sonicated to an 
average size >500 bp; (bottom) the average size is <500 bp. The human chromatin (red) used to spike 
the samples is from the same batch and has an average size of 500 bp. Size selection from 200 to 400 bp 
is expected to result in a smaller proportion of mouse chromatin in the first case than in the second case. 
(B) Size representation obtained by fragment analyzer (top) and 1% agarose gel electrophoresis (bot- 
tom) of three mouse chromatin samples sonicated for 5 (S5), 1 0 (SI 0), and 1 5 (SI 5) cycles of 1 0 sec, as 
indicated above the lanes. The position of DNA size markers (in bp) is indicated on the left. The last lane 
shows the human chromatin spike sample. (C) Scatter plots showing the relation of mouse Pol III loci scores 
before and after spike adjustment for the three pairs of samples sonicated for different amounts of time. The 
Pearson and Spearman correlations before and after spike adjustment were as follows: 97.5_S5 versus 
97.5_S10, 0.9927^0.9935 and 0.9678^0.9653; 97.5_S5 versus 97.5_S15, 0.9900^0.9885 and 
0.9728^0.9663; and 97.5_S1 0 versus 97.5_S1 5, 0.991 7^0.9926 and 0.9626^0.9636. 



occupancy, tags mapping to known targets 
for both human and mouse Pol III (Canella 
et al. 2010, 2012; Renaud et al. 2014) rep- 
resented 0.01%-1% of the total number 
of aligned tags (see Supplemental Table 
S1A,B). To determine whether the spike 
adjustment method might be more gen- 
erally applicable, we applied it to chroma- 
tin samples immunoprecipitated with 
anti-POLR2B antibodies, and we calcu- 
lated preliminary scores around TSSs using 
the SPP software (https://sites.google.com/ 
a/brown. edu/bioinformatics-in-biomed/ 
spp-r-from-chip-seq) (Kharchenko et al. 
2008). The samples, referred to as RPB2_95 
and RPB2_90 (see Table 1), contained dif- 
ferent percentages of human chromatin 
(which was managed in the analysis by 
the species-specific scaling) (step 2 in 
Fig. 2) but otherwise were derived from 
the same batch of mouse chromatin 
and processed similarly (for numbers of 
tags aligned to mouse and human ge- 
nomes, see Supplemental Table S2A,B) 
and can thus be considered technical 
replicates. We calculated Pol II scores 
in mouse regions extending from -250 
to +250 bp around 11,217 annotated 
TSSs selected to be separated by at least 
1000 bp from any other annotated TSS or 
polyadenylation site (see Le Martelot et 
al. 2012). 

Figure 6A, left and right panels, 
show ECDF graphs of the samples after 
SPP score calculation, or after SPP score 
calculation and spike adjustment. The 
replicates were of high quality such that 
they were very close even before spike 
adjustment. Nevertheless, spike adjust- 
ment decreased the Kolmogorov-Smirnov 
distance between the two samples by 
more than half. The improvement is also 
visible in the scatter plot in Figure 6B, 
showing a tightening of the scores along 
the x = y line after spike adjustment. 
Thus, spike adjustment performs well 
not only for samples immunoprecipi- 
tated with an antibody targeting Pol III, 
but also for samples immunoprecipitated 
with an antibody targeting Pol II. More- 
over, it can be applied to scores calculated 
by a method other than the one we spe- 



Improvement of Pol II ChlP-seq biological replicate similarity 
by spike adjustment 



cifically developed for Pol III occupancy. 
As further discussed below, this method is thus likely to be widely 
applicable. 



In the examples above, we used a method to calculate preliminary 
scores (Fig. 2, step 4) that is tailored to ChlP-seq experiments where 
the total genomic target of the factor of interest is relatively small and 
where, therefore, the tags mapping to this target represent a small 
percentage of the total amount of tags aligning onto the genome, as 
is the case for many factors (Landt et al. 2012). Indeed, for Pol III 



Discussion 

We describe a normalization method for ChlP-seq experiments 
that is not confined to computational treatment of the data but 
includes an experimental step, namely the addition of an internal 
reference to each sample. This internal reference consists of a small 
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amount of chromatin (spike) from a different species than the 
chromatin being tested, but a species close enough that the factors 
of interest share conserved epitopes, in our case human chromatin 
added to mouse chromatin. The internal reference is mixed with 
the experimental sample and undergoes all experimental steps 
following fragmentation of the chromatin, i.e., immunoprecipitation, 
library preparation, and sequencing. The method is related, in its 
principle of introducing an internal reference into each sample, to the 
method recently described by Loven et al. (2012) to normalize RNA- 
seq data. In that case, a synthetic RNA standard is added to each RNA 
sample to be analyzed in proportion to the starting number of cells, 
thus allowing quantification of RNA relative to starting cell number 
(Loven et al. 2012). 

We show that the spike signal allows quality control. Indeed, it is 
in principle affected only by experimental (rather than biological) 
variations, and thus allows one to pinpoint dubious experimental 
samples that should be considered with circumspection and possibly 
discarded. For samples passing this quality control test, the SAP both 
improves similarity between replicates, without disrupting the dis- 
tribution of the data, and reliably reveals true biological differences. 
Thus, on one hand, spike adjustment prevents the erroneous per- 
ception of differences when there are no genuine differences in 
protein occupancy; i.e., it reduces false-positive calls. On the other 
hand, it allows reliable recognition of real differences in occupancy; 
i.e., it also reduces false-negative calls. 

The amount of spike material to be added to the sample 
should be as low as possible to give a robust spike signal and yet to 
contribute as few ambiguous tags as possible. This amount will 
vary with sequencing depth and size of the ChIP genomic target 
(for an exploration of this relationship, see Methods). We tested 
adding different amounts of human "spike" chromatin to the 
mouse chromatin and found that for our experiments, 2.5% was 
sufficient to provide a robust spike signal (Fig. 3). It might be ad- 
vantageous to use as much as 5% spike chromatin because this may 
allow the "rescue" of poorer quality experimental samples. On the 
other hand, an increase in spike material might result in an in- 
crease in the number of ambiguous tags, i.e., tags that map to both 
the mouse and the human genomes, and this in turn might affect 
the mouse scores, especially for lowly occupied genes near the 
detection limit and in genes highly conserved in mouse and 
humans, as these tags are removed from the analysis. Thus, for 
analyses focusing on individual gene scores rather than on score 
distributions, it may be beneficial to add the ambiguous tags to the 
mouse tags, their most likely origin given that most of the starting 
material is mouse chromatin, with some attention to cases where 
a highly expressed spiked-in gene shares tags with a lowly 
expressed mouse gene. 

The spike chromatin was added to the sample chromatin after 
the sonication step. Indeed, although it would in principle be 
preferable to mix the two materials before sonication, the difficulty 
of precisely quantifying tissue, cells, or the viscous presonication 



chromatin makes it impractical. Thus, when samples with differ- 
ent fragment size distributions are mixed with the same batch of 
sonicated spike chromatin, the proportion of spike chromatin 
fragments will differ in different samples. This is in principle cor- 
rected by the scaling to total number of tags, as this scaling is per- 
formed separately for the human and the mouse tags. Indeed, we 
found that the SAP gave very similar results for chromatin samples 
varying up to threefold in sonication time. 

We tested the SAP in the study of Pol III occupancy, because this 
is one case where genome occupancy is likely to vary in a global 
manner and where current normalization methods are prone to 
failure. Indeed, Pol III transcription is, for example, elevated in 
cancer cells, and is globally diminished under certain conditions 
such as nutrient deprivation or other kinds of stress (for reviews, 
see White 2004; Goodfellow and White 2007; Gjidoda and Henry 
2013). In yeast, a global Pol III transcription decrease upon nutrient 
deprivation is accompanied by a general decrease in Pol III occu- 
pancy at Pol III loci (Roberts et al. 2003, 2006; Oficjalska-Pham et al. 
2006). Consistent with such global regulation, most known regula- 
tors of Pol III transcription act on general transcription factors used 
by all Pol III promoters such as TFIIIB or, in the case of the general 
Pol III repressor MAF1, on the polymerase itself (for reviews, see 
Geiduschek and Kassavetis 2006; Willis and Moir 2007; Ciesla and 
Boguta 2008). Indeed, we show here that deletion of the mouse Mafl 
gene leads to generally increased Pol III occupancy at Pol III loci in 
a tissue, the mouse liver. Such global changes in chromatin occu- 
pancy are likely to be more common than generally appreciated. For 
example, it has recently been shown that an increase in MYC protein 
leads to a general "transcriptional amplification/' which is accom- 
panied by increased MYC and Pol II occupancy at most promoters 
(Lin et al. 2012; Nie et al. 2012). The SAP can make detection of such 
global changes by ChlP-seq experiments more reliable. 

Although we developed the SAP for the specific purpose 
of comparing Pol III occupancy under various biological condi- 
tions, the method is not limited to this particular application. 
We have also shown that the SAP improved similarity of replicate 
samples for Pol II ChlP-seq scores calculated with the SPP software; 
spike adjustment can thus be applied for ChlP-seq results other 
than Pol III and for scores calculated by different methods. More- 
over, although the SAP is in principle limited by the availability of 
an antibody capable of recognizing the target of interest in different 
species, such antibodies are in fact common for many factors widely 
studied by ChlP-seq experiments, such as RNA polymerases and 
other members of the general transcription machinery, or histones 
and their modifications, as these are in general highly conserved in 
different species. Indeed, in this work we used antibodies that rec- 
ognize human and mouse Pol III as well as human and mouse Pol II, 
and showed that for both of these factors, the method performs well. 
We have used human chromatin to spike mouse chromatin, but the 
reverse can be done, and chromatin from other species could be used 
for spiking according to needs, as long as the epitopes in the targets 



Figure 5. Spike adjustment improves similarity between replicates and reveals genuine differences in Pol III occupation. (A,B) Scatter plots showing the 
relation of Pol III loci scores between the two WT (A) and the two Mafl KO (B) replicate samples before (orange) and after (black) spike adjustment. The red 
line corresponds to x = y. (C-£) Boxplot representations of the Pol III loci score distributions for the two WT samples (light and dark green, mRI _WT and 
mR2_WT) and the two Mafl KO samples (light and dark blue, mRI _KO and mR2_KO). The scores were normalized to total number of tags aligned onto 
the genome (C) followed by either quantile normalization (D) or spike adjustment (f). (F-H) Empirical cumulative frequency distributions functions 
(ECDFs) of the log scores of the indicated distribution. Samples were normalized to the total number of tags aligned onto the genome (F) followed by 
either quantile normalization (C) or spike adjustment (H). The Kolmogorov-Smirnov (KS) distance for the two WT (green lines) and the two Mafl KO (blue 
lines) samples is shown at the bottom right of each panel. (/,/) Mean difference scatter plots illustrating Pol III occupancy in WTand Mafl KO livers. Samples 
were normalized to the total number of tags aligned onto the genome followed by quantile normalization (/), respectively by spike adjustment (J). Scores 
for WT and KO conditions are the average of the two replicates. Loci with scores showing a significant difference in the WT versus Mafl KO samples are 
represented with yellow (P < 0.01 ) and red (0.01 < P < 0.05) dots. 
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Figure 6. Spike adjustment improves the similarity of two Pol II ChlP-seq replicate experiments. (A) ECDFs 
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the SPP software. The KS distance is shown at the bottom right of each panel. (Dark line) RPB2_90 sample; 
(light line) RPB2_95 sample. (B) Scatter plots showing the relation between the RPB2_90 and RPB2_95 scores 
before (orange dots) and after (black dots) spike adjustment. The red line corresponds to x = y. 



studied are conserved. Further, when using cells or organisms ex- 
pressing tagged proteins combined with antibodies directed against 
the tags, an internal control chromatin, i.e., chromatin from cells 
expressing a chosen factor carrying the same tag, can be designed. 
The spike adjustment method should thus be widely applicable. 

Methods 
Spiked mouse ChIP 

Perfused C57BL/6 mice liver were homogenized in 4 mL of PBS 
containing 1% of formaldehyde and left in the same buffer for cross- 
linking for a total of 10 min. Nuclei were isolated as described in 
Ripperger and Schibler (2006). Nuclear lysis was performed in 1.2 mL 
of 50 mM Tris/HCl (pH 8.1), 10 mM EDTA, 1% SDS, 50 [xg/mL PMSF, 
1 |xg/mL leupeptin. The nuclear lysate was then supplemented with 
0.92 mL of 20 mM Tris/HCl (pH 8. 1), 150 mM NaCl, 2 mM EDTA, 1% 
Triton X-100, 0.01% SDS, 50 |xg/mL PMSF, 1 |jig/mL leupeptin and 
sonicated with a Branson SLPe sonicator during 10 cycles of 10 sec at 
50% amplitude, resulting in an average fragment size between 300 
and 1000 bp. Between each sonication cycle, the chromatin was kept 
in an ice-cold bath during 20 sec. The samples 97.5_S5 and 97.5_S15 



were sonicated with five and 15 cycles, 
respectively, of 10 sec each. Chromatin 
samples from three mice were pooled 
and de-cross-linked, and an aliquot was 
extracted for DNA quantification. Human 
HeLa cell chromatin was prepared as de- 
scribed in Canella et al. (2010), and DNA 
concentration was assessed. 

ChlPs were performed with 30.8 |xg 
of total DNA in the appropriate mouse/ 
human chromatin ratio and 10 fxL of rab- 
bit serum immunized against a peptide 
100% conserved in human and mouse 
POLR3D (CS681 antibody, C-terminal 
peptide CSPDFESLLDHKHR) (Chong et al. 
2001). This antibody has been used ex- 
tensively for ChlP-seq experiments, in 
both human and mouse cells (Canella 
et al. 2010, 2012; Renaud et al. 2014). For 
the anti-Pol II ChlPs, the commercial 
antibody anti-POLR2B (H-201; catalog 
no. sc-67318, Santa Cruz Biotechnology) 
recognizing human, mouse, and rat 
POLR2B was used. The ChlPs were per- 
formed as described previously in Forsberg 
et al. (2000) and Dhami et al. (2010) with 
a few modifications. Briefly, the chromatin 
samples were incubated with the anti- 
bodies overnight at 4°C. The next day, 20 
jjlL of protein A-sepharose beads (CL4B GE 
Healthcare) was added and the samples 
were further incubated for 3 h. The beads 
were next washed once with 20 mM Tris/ 
HCL (pH 8.1), 50 mM NaCl, 2 mM EDTA, 
1% Triton X-100, 0.1% SDS; twice with 10 
mM Tris/HCL (pH 8.1), 250 mM LiCl, 
1 mMEDTA, 1% NP-40, 1% deoxycholic 
acid; and twice with TE buffer 1 X (10 mM 
Tris-Cl at pH 7.5. 1 mM EDTA). Bound 
material was then eluted from the beads 
in 300 (jlL of elution buffer (100 mM 
NaHC0 3 , 1% SDS), treated first with RNase 
A (final concentration 8 |xg/mL) during 6 h 
at 65°C and then with proteinase K (final concentration 345 fxg/mL) 
overnight at 45°C. The next day, the samples were purified with 
a PCR clean-up kit from Macherey Nagel and eluted in 50 jjlL of 
elution buffer. Sample 97.5_P1 was prepared as described above ex- 
cept that 1.5% of the immunoprecipitation supernatant was added 
back to the bead-eluted immunoprecipitated material. 



Ultra-high-throughput sequencing 

Ten nanograms of DNA from each ChIP was next used to prepare 
sequencing libraries according to the Illumina ChlP-seq DNA 
sample prep protocol (Illumina, catalog no. IP-102-1001), except 
that size selection of the samples was performed after, rather than 
before, library amplification. Sequencing libraries were loaded 
onto one lane of a HiSeq 2000 flow cell and sequenced at 100 cy- 
cles. For each condition, we sequenced input chromatin sample 
and the corresponding ChIP sample (s). 

Analysis method principle 

Samples contain a fixed amount of added-in reference (human) 
chromatin (spike). We assume that any variation in the back- 
ground-adjusted counts from this constant reference chromatin 
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reflects technical experimental variations and that, therefore, a scal- 
ing factor estimated from the reference chromatin can be used to 
adjust tag counts in the experimental chromatin. Tags are assigned to 
the reference (human) or the experimental (mouse) chromatin and 
analyzed separately, each according to the model below. For both, the 
input samples are used for computing background-adjusted counts. 
To simplify notation, we consider that there is only one ChIP sample 
per condition, indicated by the index k. We assume, as in Enroth et al. 
(2012), that tags in the ChIP sample come from the following sources: 
specific binding to the antibody (true enrichment), nonspecific 
binding (to the antibody and the beads), and random noise. 

The genome is partitioned into segments roughly the size of 
the regions of interest. Tag counts are computed for all such ge- 
nomic segments. The probability distribution for the nonspecific 
tag counts is denoted as x it where i indicates a genomic segment (or 
x^ for segment i in sample k). The distribution of the specific tags 
for condition k is denoted as y i/k . The observed counts for the 
segments in the input sample are denoted as b i/k and are a multiple 
of x i/k with experimental errors & i/k : 



bi,k=yk x i,k + Zi,k- 



(1) 



For the ChIP samples, the observed counts z i>k are given by 
z i:k = a k y ik + $ k x iM + z i:k , (2) 

where a k y i/k are the specific tag counts corresponding to protein 
occupancy scores (signal), x i/k is the nonspecific tag distribution, as 
in Equation 1, and z i/k is random noise. Equation 1 is used to es- 
timate $ k x i/k in Equation 2. 

Analysis method principle: preliminary score calculation 
for Pol III data 

Our goal is to estimate the signal counts in regions of interest, 
namely a k y i/k in Equation 2. A key assumption is that the nonspecific 
segment counts in ChIP are proportional to their observed input 
segment counts. When most segments are not enriched by ChIP, 
this implies a linearity of segment counts in the ChIP sample 
versus the input sample. As shown in Supplemental Figure SI, 
this is indeed the case for our data, in which tags mapping to the 
regions of interest (400-bp bins overlapping with "Pol III loci/' 
i.e., annotated Pol III genes [whether occupied by Pol III or not] as 
well as other previously identified Pol Ill-occupied loci; for the 
list, see Tables S2 [human loci] and S3 [mouse loci] in Renaud et al. 
2014) represent a small percentage of the total amount of tags 
aligning onto the genome (0.01%-1%) (see Supplemental Table 
S1A,B). To adjust for variation in the amount of specific counts in 
segments of interest, i.e., here Pol III loci, we consider the signal 
0LkYi,k- Formally, from Equation 2, 



«ky i , k = ^- j 8 k x i)k , 



(3) 



where (3 k is estimated from 400-bp genomic bin counts outside of 
the regions being scored. In practice, using the observed back- 
ground (Equation 1) we estimate them as the positive residuals of 
the regression of ChIP counts z i/k on input counts: 



Wi k 



c(0,a k y i>k ). 



(4) 



The above scoring scheme is applied to calculate preliminary signal 
counts in both human and mouse samples independently. 

Note that the principle of the SAP can also be performed suc- 
cessfully with simple log ratio scores (of ChIP with input, data not 
shown), as well as SPP scores (Fig. 6; Kharchenko et al. 2008). 



Analysis method principle: determination of the spike 
adjustment factor 

For the spike chromatin, we expect that background-adjusted 
counts should in principle be identical from sample to sample and 
that any difference reflects technical experimental variations. Thus, 
we use the human spike chromatin to compute a scaling factor to 
adjust for different yields in specific background-subtracted counts. 
Let wj and wj be the set of positive residuals computed from 
Equation 4 for a single sample k and a reference r. In practice, as 
reference we take the mean of positive residuals across all samples. 
Then the spike-adjustment scaling factor for sample k can be written 
using the means of signals in spike chromatin as 



y\ k \=mean k (w^) /mean r (w u ) = X/ W i,k/X; w i>r , 



(5) 



where the index j is used instead of i to indicate that only a selected 
set of regions with reliable signals in the spike material is used in 
Equation 5. 

The adjustment is then applied to the spike material for 
quality control and to the experimental chromatin to obtain ad- 
justed protein occupancy scores: 



wTk ~ w ijk /ii k , 



(6) 



where the tilde symbol (~) is used to refer to scores obtained after 
spike adjustment. The wj values are non-negative and can be used 
for analysis of relative occupancy in linear or log scale. 

To obtain (log) ratios between the counts in the IP sample and 
in the input sample, we use the estimator: 



Lz i)k = log 2 



(7) 



where the pseudo counts (pc) are typically set to one but can be set 
higher to regularize ratios. In regions of high occupancy, Lz i k is 
positive, whereas in regions where w^ is very small or zero, the log 
ratio can be negative. These are the (log) scores we used in our 
analysis (Figs. 4-6). 



Analysis method principle: sequencing depth and spike 
percentage 

The calculation of the spike adjustment factor in Equation 6 is 
based on the mean of the sum of n signals in regions that, for us, 
correspond to annotated Pol III genes (whether or not occupied by 
Pol III) and other previously identified Pol Ill-occupied loci 
(Renaud et al. 2014). In principle, any set of known regions 
enriched in the factor of interest can be used. The efficiency of 
spike adjustment depends on sequencing depth and percentage of 
spike material. Here we make an estimate of the standard error of 
the adjustment factor considering the random variation of tag 
sampling. The adjustment factor is the ratio of the estimated 
means for the regions used (Equation 5). The sum of signals used 
for correction can be estimated as such: For the data in Figure 5, the 
sequencing depth is R« 1.3 X 10 8 . The spike percentage is s = 2.5%, 
and signal content in the entire sample, computed for our set of 
700 loci, is p « 0.2%. Therefore, the total signal count sum is S = R X 
s X p « 6500. Of the 700 human Pol III loci studied previously 
(Renaud et al. 2014), we singled out about 500 significantly oc- 
cupied loci, which account for >90% of all Pol III loci counts (S « 
6000). The mean count per locus used for adjustment is thus about 
S/n « 13, but it is the precision of S that determines the precision of 
the adjustment factor. Under the classic Poisson assumption for 
sampling error, the theoretical relative error r = sqrt(S)/S is about 
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0.013. For the adjustment factor, which is the ratio of two such 
quantities, the relative error is the double, thus —2.5%. To halve 
the theoretical relative error, one needs about four times more se- 
quence tags, for example by increasing the proportion of the spike 
material to s = 10%. If there are more counts in known sites in the 
spike chromatin, say p = 10%, then one could reduce the spike 
chromatin percentage or sequence less deeply. Exact conditions 
need, however, to be determined for the specific parameters of 
each experiment. 

Data analysis: tag alignment 

The 100-nt sequence tags obtained after ultra-high-throughput 
sequencing were mapped onto the UCSC genome versions men- 
tioned in Supplemental Tables SI, A and B, and S2, A and B, via the 
eland_extended mode of ELAND v2e in the Illumina CASSAVA 
pipeline vl.8.2. Only the tags with perfect matches, which repre- 
sented >85% of the data, were kept for the analysis. Tags sequenced 
more than 50 times were given a maximum score of 50. 

For the Pol III samples, counts were assigned to previously 
defined lists of human and mouse annotated Pol III genes and Pol 
Ill-occupied loci (Tables S2 and S3 in Renaud et al. 2014). For each 
locus the annotated RNA-coding region (e.g., tRNA) was extended 
by 150 bp on each side. One tag sequence was worth one count, 
and fractional counts were attributed in the case of a partial 
overlap between tag and locus. For the Pol II samples, tag counts 
were attributed in the same manner to regions extended by 250 
bp on each side of the 22,572 annotated RefSeq TSSs (human), 
and on 11,217 annotated TSSs selected to be separated by at least 
1000 bp from any other annotated TSS or polyadenylation site 
(mouse) (Le Martelot et al. 2012). The total numbers of tags, with 
and without redundancy, aligned onto the mouse and human 
genomes, as well as the numbers of tags falling in either mouse or 
human Pol III loci, are listed in Supplemental Tables SI, A and B, 
and S2, A and B. 

Data analysis: normalization for sequencing depth 

We normalized the mouse and human tags separately. We took the 
median of the total numbers of aligned tags across all the samples. 
We used this median as a reference total count. We then scaled bin 
counts in all samples to obtain a new total sample count equal to 
the reference total count. The typical reference total count was 150 
million tags for mouse and 3 million for human tags. The input 
samples were normalized to the same total reference count as the 
ChIP samples. 

Data analysis: calculating preliminary scores 

For the Pol III data, we calculated scores as the non-negative re- 
siduals of the regression of ChIP bin counts versus input bin counts 
(see Equations 2-4). For the Pol II data, we calculated spp scores 
(Fig. 6). The regression coefficients from Equation 3 were estimated 
based on genomic bin counts outside of the regions being scored. 
For the Pol III experiments, we thus used a set of 400-bp bins 
covering the genome (6,637,291 bins on the mouse genome and 
7,739,205 on the human genome). For the Pol II experiments, we 
used 500-bp bins (5,309,835 bins on the mouse genome and 
6,191,402 on the human genome). We calculated tag counts for all 
bins, for ChIP and input samples. After selecting bins that did not 
overlap with the regions to be scored, we performed a robust linear 
regression on ChIP versus input. We used the regression co- 
efficients to compute aTy^ in regions to be scored. The values wj, 
which were background-adjusted, estimated the counts due to 
specific immunoprecipitation. 



Data analysis: calculation of the spike adjustment factor 
and score adjustment 

We used either a subset or all of the scored regions in the spike 
chromatin to calculate the score adjustment factor. Subselecting is 
inherent to our scoring method, since we select positive residuals 
only and set negative residuals to zero. Depending on the data 
analyzed, it might be appropriate to take upper quantiles or use 
a threshold score. The spike adjustment scaling (iq) between two 
samples k and r was then computed as 

where the w's were the preliminary scores in the human 
regions / for these two samples. The index r here indicates a "ref- 
erence" sample. The spike adjustment factor r\ was then applied 
as scaling factor to adjust the corresponding mouse gene scores: 
w5 = wJ/tq^. The example above considers the spike adjustment 
of one sample k with respect to a reference sample r. In practice, 
we adjusted multiple samples together. In our analysis and all 
figures, the reference was taken to be the mean of means of scores 
in all samples. 

Data analysis: final scores for follow-up analysis 

After spike adjustment, for final quantification we used the ad- 
justed log ratios, as shown in Equation 7. The adjusted gene scores 
w are still in linear scale. We re-added the estimated background 
and then took a log ratio with the observed background. These 
final scores were used in all figures in this manuscript. 

Data access 

The data from this study have been submitted to the NCBI Gene 
Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo) 
under accession number GSE52049. 

The CycliX Consortium 

Nouria Hernandez, 1 Mauro Delorenzi, 2,3,4 Bart Deplancke, 5 
Beatrice Desvergne, 1 Nicolas Guex, 6 Winship Herr, 1 Felix Naef, 5 
Jacques Rougemont, 7 Ueli Schibler, 8 Teemu Andersin, 8 Pascal 
Cousin, 1 Federica Gilardi, 1 Pascal Gos, 8 Fabienne Lammers, 1 Sunil 
Raghav, 5 Dominic Villeneuve, 1 Roberto Fabbretti, 6 Volker Vlegel, 6 
Ioannis Xenarios, 1,2,6 Eugenia Migliavacca, 1,6 Viviane Praz, 1,2 
Fabrice David, 2,7 Yohan Jarosz, 2,7 Dmitry Kuznetsov, 6 Robin 
Liechti, 6 Olivier Martin, 6 Julien Delafontaine, 2,7 Julia Cajan, 5 Kyle 
Gustafson, 1 Irina Krier, 5 Marion Leleu, 2,7 Nacho Molina, 5 Aurelien 
Naldi, 7 Leonor Rib, 1 Laura Symul, 5 and Gergana Bounova 1,2 



Center for Integrative Genomics, Faculty of Biology and Medicine, University 

of Lausanne, 1015 Lausanne, Switzerland 

2 Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland 

3 Bioinformatics Core Facility, Swiss Institute of Bioinformatics, 1015 Lausanne, 

Switzerland 

department of Oncology and Ludwig Center for Cancer Research, Faculty of 
Biology and Medicine, University of Lausanne, 101 1 Lausanne, Switzerland 
5 lnterfaculty Institute of Bioengineering, School of Life Sciences, Ecole poly- 
technique Federale de Lausanne, 1015 Lausanne, Switzerland 
6 Vital IT, Swiss Institute of Bioinformatics, 1 01 5 Lausanne, Switzerland 
7 Bioinformatics and Biostatistics Core Facility, School of Life Sciences, Ecole 
polytechnique Federale de Lausanne, 1015 Lausanne, Switzerland 
department of Molecular Biology, Faculty of Sciences, University of Geneva, 
1 21 1 Geneva, Switzerland 



Genome Research 1 1 67 

www.genome.org 



Bonhoure et al. 



Acknowledgments 

We thank Michael Wiederkehr for his assistance. We thank Keith 
Harshman, Director of the Lausanne Technologies Facility, where 
all the ultra-high-throughput sequencing was performed, and 
Ioannis Xenarios, Director of the Vital-IT (http://www.vital-it.ch) 
Center for High Performance Computing of the Swiss Institute 
of Bioinformatics. Maintenance of the CycliX servers was pro- 
vided by Vital-IT. This work was financed by CycliX, a grant 
from the Swiss SystemsX.ch initiative evaluated by the Swiss 
National Science Foundation, Sybit, the SystemsX.ch IT unit, 
SNSF grant 31003A_132958 to N.H., and the University of 
Lausanne. 



References 

Barski A, Cuddapah S, Cui K, Roh TY ; Schones DE, Wang Z, Wei G, Chepelev 
I, Zhao K. 2007. High-resolution profiling of histone methylations in the 
human genome. Cell 129: 823-837. 

Canella D, Praz V, Reina JH, Cousin P, Hernandez N. 2010. Defining the RNA 
polymerase III transcriptome: genome-wide localization of the RNA 
polymerase III transcription machinery in human cells. Genome Res 20: 
710-721. 

Canella D, Bernasconi D, Gilardi F, LeMartelot G, Migliavacca E, Praz V, 
Cousin P ; Delorenzi M ; Hernandez N. 2012. A multiplicity of factors 
contributes to selective RNA polymerase III occupancy of a subset of 
RNA polymerase III genes in mouse liver. Genome Res 22: 666-680. 

Chong SS, Hu P, Hernandez N. 2001. Reconstitution of transcription from 
the human U6 small nuclear RNA promoter with eight recombinant 
polypeptides and a partially purified RNA polymerase III complex. J Biol 
Chem 276: 20727-20734. 

Ciesla M ; Boguta M. 2008. Regulation of RNA polymerase III transcription 
by Mafl protein. Acta Biochim Pol 55: 215-225. 

Desai N, Lee J, Upadhya R, Chu Y, Moir RD, Willis IM. 2005. Two steps in 
Mafl -dependent repression of transcription by RNA polymerase III. 
/ Biol Chem 280: 6455-6462. 

Dhami P, Bruce AW, Jim JH, Dillon SC, Hall A, Cooper JL, Bonhoure N, 
Chiang K, Ellis PD, Langford C, et al. 2010. Genomic approaches 
uncover increasing complexities in the regulatory landscape at the 
human SCL (TALI) locus. PLoS ONE 5: e9059. 

Enroth S, Andersson CR, Andersson R, Wadelius C, Gustafsson MG, 
Komorowski J. 2012. A strand specific high resolution normalization 
method for chip-sequencing data employing multiple experimental 
control measurements. Algorithm Mol Biol 7: 2. 

Forsberg EC, Downs KM, Christensen HM, Im H, Nuzzi PA, Bresnick EH. 2000. 
Developmentally dynamic histone acetylation pattern of a tissue- 
specific chromatin domain. Proc Natl Acad Sci 97: 14494-14499. 

Geiduschek EP, Kassavetis GA. 2006. Transcription: adjusting to adversity by 
regulating RNA polymerase. Curr Biol 16: R849-R85 1 . 

Gjidoda A, Henry RW. 2013. RNA polymerase III repression by the 

retinoblastoma tumor suppressor protein. Biochim Biophys Acta 1829: 
385-392. 

Goodfellow SJ, White RJ. 2007. Regulation of RNA polymerase III 

transcription during mammalian cell growth. Cell Cycle 6: 2323-2326. 
Johnson DS, Mortazavi A, Myers RM, Wold B. 2007a. Genome-wide 

mapping of in vivo protein-DNA interactions. Science 316: 1497-1502. 
Johnson SS, Zhang C, FrommJ, Willis IM, Johnson DL. 2007b. Mammalian 

Mafl is a negative regulator of transcription by all three nuclear RNA 

polymerases. Mol Cell 26: 367-379. 
Kharchenko PV, Tolstorukov MY, Park PJ. 2008. Design and analysis of ChlP- 

seq experiments for DNA-binding proteins. Nat Biotechnol 26: 1351-1359. 



Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, 
Bernstein BE, Bickel P, Brown JB, Cayting P, et al. 2012. ChlP-seq 
guidelines and practices of the ENCODE and modENCODE consortia. 
Genome Res 22: 1813-1831. 

Le Martelot G, Canella D, Symul L, Migliavacca E, Gilardi F, Liechti R, Martin 
O, Harshman K, Delorenzi M, Desvergne B, et al. 2012. Genome-wide 
RNA polymerase II profiles and RNA accumulation reveal kinetics of 
transcription and associated epigenetic changes during diurnal cycles. 
PLo5B/o/10:el001442. 

Li QH, Brown JB, Huang HY, Bickel PJ. 2011. Measuring reproducibility of 
high-throughput experiments. Ann Appl Stat 5: 1752-1779. 

Lin CY, Loven J, Rahl PB, Paranal RM, Burge CB, Bradner JE, Lee TI, Young 
RA. 2012. Transcriptional amplification in tumor cells with elevated 
c-Myc. Cell 151: 56-67. 

Loven J, Orlando DA, Sigova AA, Lin CY, Rahl PB, Burge CB, Levens DL, Lee 
TI, Young RA. 2012. Revisiting global gene expression analysis. Cell 151: 
476-482. 

MikkelsenTS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, Alvarez P, 
Brockman W, Kim TK, Koche RP, et al. 2007. Genome-wide maps of 
chromatin state in pluripotent and lineage-committed cells. Nature 448: 
553-560. 

Nie Z, Hu G, Wei G, Cui K, Yamane A, Resch W, Wang R, Green DR, Tessarollo 
L, Casellas R, et al. 2012. c-Myc is a universal amplifier of expressed genes 
in lymphocytes and embryonic stem cells. Cell 151: 68-79. 

Oficjalska-Pham D, Harismendy O, Smagowicz WJ, Gonzalez de Peredo A, 
Boguta M, Sentenac A, Lefebvre O. 2006. General repression of RNA 
polymerase III transcription is triggered by protein phosphatase type 2A- 
mediated dephosphorylation of Mafl. Mol Cell 22: 623-632. 

Pluta K, Lefebvre O, Martin NC, Smagowicz WJ, Stanford DR, Ellis SR, Hopper 
AK, Sentenac A, Boguta M. 2001. Maflp, a negative effector of RNA 
polymerase III in Saccharomyces cerevisiae. Mol Cell Biol 21: 5031-5040. 

Rahl PB, Lin CY, Seila AC, Flynn RA, McCuine S, Burge CB, Sharp PA, Young RA. 
2010. c-Myc regulates transcriptional pause release. Cell 141: 432-445. 

Reina JH, Azzouz TN, Hernandez N. 2006. Mafl, a new player in the 

regulation of human RNA polymerase III transcription. PLoS ONE 1: el34. 

Renaud M, Praz V, Vieu E, Florens L, Washburn MP, L'Hote P, Hernandez N. 
2014. Gene duplication and neofunctionalization: POLR3G and 
POLR3GL. Genome Res 24: 37-51. 

Ripperger J, Schibler U. 2006. Rhythmic CLOCK-BMAL1 binding to 

multiple E-box motifs drives circadian Dbp transcription and chromatin 
transitions. Nat Genet 38: 369-374. 

Roberts DN, Stewart AJ, Huff JT, Cairns BR. 2003. The RNA polymerase III 
transcriptome revealed by genome-wide localization and 
activity-occupancy relationships. Proc Natl Acad Sci 100: 14695- 
14700. 

Roberts DN, Wilson B, Huff JT, Stewart AJ, Cairns BR. 2006. Dephosphoryla- 
tion and genome-wide association of Mafl with Pol Ill-transcribed genes 
during repression. Mol Cell 22: 633-644. 

Rollins J, Veras I, Cabarcas S, Willis I, Schramm L. 2007. Human Mafl 
negatively regulates RNA polymerase III transcription via the TFIIB 
family members Brfl and Brf2. IntJBiol Sci 3: 292-302. 

Upadhya R, Lee J, Willis IM. 2002. Mafl is an essential mediator of diverse 
signals that repress RNA polymerase III transcription. Mol Cell 10: 1489- 
1494. 

Vannini A, Ringel R, Kusser AG, Berninghausen O, Kassavetis GA, Cramer P. 

2010. Molecular basis of RNA polymerase III transcription repression by 

Mafl. Cell 143: 59-70. 
White RJ. 2004. RNA polymerase III transcription and cancer. Oncogene 23: 

3208-3216. 

Willis IM, Moir RD. 2007. Integration of nutritional and stress signaling 
pathways by Mafl. Trends Biochem Sci 32: 51-53. 



Received October 11, 2013; accepted in revised form March 31, 2014. 



1 1 68 Genome Research 

www.genome.org 



